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CN . Abstract A change point problem occurs in many statistical applications. If there exist 
I't'I ■ change points in a model, it is harmful to make a statistical analysis without any consideration 
of the existence of the change points and the results derived from such an analysis may be 
misleading. There are rich literatures on change point detection. Although many methods 
have been proposed for detecting multiple change points, using these methods to find multiple 
change points in a large sample seems not feasible. In this article, a connection between 
multiple change point detection and variable selection through a proper segmentation of data 
sequence is established, and a novel approach is proposed to tackle multiple change point 
detection problem via the following two key steps: (1) apply the recent advances in consistent 
variable selection methods such as SCAD, adaptive LASSO and MCP to detect change points; 
(2) employ a refine procedure to improve the accuracy of change point estimation. Five 
algorithms are hence proposed, which can detect change points with much less time and more 
accuracy compared to those in literature. In addition, an optimal segmentation algorithm 
based on residual sum of squares is given. Our simulation study shows that the proposed 
algorithms are computationally efficient with improved change point estimation accuracy. 
The new approach is readily generalized to detect multiple change points in other models 
such as generalized linear models and nonparametric models. 

KEY WORDS: Adaptive LASSO; Asymptotic normality; Least squares; Linear model; MCP; 
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Multiple change point detection algorithm; SCAD; Variable selection. 
1. Introduction 

The most popular statistical model used in practice is a linear model, which has been ex- 
tensively studied in the literature. This model is simple and can be used to approximate a 
nonlinear function locally. However there may be change points in a linear model such that 
the regression parameters may change at these points. Thus if there do exist change points 
in a linear model, the linear model is actually a segmented linear model. 

A change point problem occurs in many statistical applications in the areas including 
medical and health sciences, life science, meteorology, engineering, financial econometrics 
and risk management. To detect all change points are of great importance in statistical 
applications. If there exists a change point, it is harmful to make a statistical analysis without 
any consideration of the existence of this change point and the results derived from such an 
analysis may be misleading. There are rich literatures on change point detection, see, e.g., 
Csorgo and Horvath (1997) and Chen and Gupta (2000). 

Compared with the detection of one change point, to locate all change points is a very 
challenge problem. Although, it has been studied in literature (see Davis, Lee, and Rodriguez- 
Yam (2006), Pan and Chen (2006), and Kim, Yu and Feuer (2009), and Loschi, Pontel and 
Cruz (2010) among others), a powerful and efficient method still needs to be explored. Thus 
this paper is mainly concerned with the multiple change point detection problem in linear 
regression. 

Consider a linear model with Kq ^ < oo multiple change points located at af'l^, . . . , , 
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where {xi^n = (a^i.i.n, • • • , is a sequence of g- dimensional predictors, /3o = (/3i,05 ■ ■ ■ : f^q,o)'^ 

7^ is unknown g-dimensional vector of regression coefficients, Ko is unknown number of 
change points, a^°^, . . ., and „ are unknown change point locations (or change points), 
6(^0, 1 ^ £ ^ Kq, denote unknown amounts of changes in regression coefficient vectors at 
change points, and ei^n, ■ ■ ■ ,£n,n are random errors. In this paper, we assume that Ku is an 
upper bound of Kq. Set „ = n. If there is no change point, Kq = and the model ([1]) 

becomes 

yi,n ^ ^ ■^i,j,nl^j,0 ~l~ ^i,nj ^ 1, . . . , 7T,. 

i=i 

Otherwise, Kq ^ 1, and we assume that 

< af^l/n -^Te<l, for 1 ^ £ ^ Kq. (2) 

If i^o ^ 2, we assume that 

min (r^+i - r^) > (3) 

is unknown. The problem studied in this paper is to estimate Kq, a^°^, . . ., and ^ or in 
other words to detect multiple change points. If there is no confusion, the superscript "(0)", 
subscript "0" , and subscript n will be suppressed. 

For detecting multiple change points, it may be convenient to consider the following linear 
model with probable multiple change points located at 1 < ai,„ < ■ ■ ■ < aK,n < n 

K 

+ '^d(>I{ai^n<i^n) + Ei, i = l,...,n, (4) 

e=i 

where f3, di, . . ., dx are unknown g-dimensional parameter vectors. We can instead test the 
following null hypothesis: 

Hq : There is no change point, i.e., for any 1 < ai^.„ < ■ ■ ■ < aK,n < 

6e = {6[^\ . . ., (jj^y = for any £ e {1, ... , K}, where 1 ^ K ^ Ku 
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versus the alternative hypothesis: 



Hi 



There exist \ ^ K ^ Ku change points, i.e., there exist l<ai „<■•■< ai^„<n 



such that Si = {5{ 



^0 for anyfG {1,...,K}. 



Many classical methods have been given in literature for detecting change points, which in- 
clude the popular model selection based change point detection method and the well known 
cumulative sum (CUSUM) method. However the amounts of computing time required by 
these two typical change point detection methods are respectively 0(2") and O(n^). When n 
is very large, using these methods to find multiple change points seems not feasible. 

If the set of all true change points in the model (j4]) is a subset of 1 ^ ^ ^ K}, it is 

easy to see that „ is a change point if and only if dj ^ 0. We rewrite (jlj) as follows: 



Vr 



where y = {yi, y2,--- , y„)^, /3 = {(3^, <5f , . . . , d'j^)'^, £„ = {ei, £2, • • • , ^n,)^, and 



XnP + Sr. 



(5) 



T \T 



Xr 



( -^(0,1) 0(0,1) 0(0,1) 

-^(1,2) ^(1,2) 0(1,2) 



0(0,1) \ 

0(1,2) 



\ X{^K,K+1) X(^K,K+1) Xi^K,K+l) ■ ■ ■ X{^K,K+1) J 



nx{K+l)q 



with 0(j_i,j) is a zero matrix of dimension (aj,„ — aj-i,n) x g, and ao,n = 0, 



( a^a . +1,1 



X. 



for j = + 



' {o-j.n—O-j-l.n 



)xq 



Thus to detect all the true change points and remove the pseudo change points in (jlj) can 
be considered as a variable selection problem for the linear regression model (j5j), and we 
may tackle the problem by employing variable selection methods. This leads us to explore 
a possibility by first properly segmenting data sequence and then applying variable selection 
methods and/or other methods for detecting probable multiple change points. 



The paper is arranged as follows. The segmentation of data sequence and multiple change 
point estimation are discussed in Section 2. Five algorithms for detecting probable multiple 
change points are proposed in Section 3. Simulation studies and practical recommendations 
are given in Section 4. Two real data examples are provided in Section 5. 

Throughout the rest of the paper, 1^ = (1, . . . , 1)^ is the g-dimensional vector, Ig is the 
q X q identity matrix, an indicator function is written as /(■), the transpose of a matrix A 
is denoted by A^, and [cj is the integer part of a real number c. For a vector a, is its 
transpose, a{j) is its jth component, \a\, \\a\\ and || respectively its Li-norm, L2-norm 

(Euclidean norm) and Loo norm. If ^ is a set, its complement and its size are denoted by 
A and 1^1, respectively. In addition, the notations "— i-p" and "— j-^" denote convergence in 
probability and convergence in distribution, respectively. Furthermore, the (1 — a)th quantile 
of the chi-square distribution with i degrees of freedom is denoted by Xa i- 

2. Segmentation and Change Point Estimation 

For a multiple change point detection problem, the multiple change point locations are un- 
known and in practice their approximate locations within a permissible range is main concern, 
which inspires us to partition the data sequence to search for change points. We thus divide 
the data sequence into p„ + 1 segments. Let m = rrin = \n/{pn + 1)J- The segmentation is 
such that the first segment has length < m ^ n — p„m ^ Com with some Cq ^ 1 and each 
of the rest pn segments has length m. Without loss of generality, we assume that pn — oo as 
n — > oo. The partition of the data sequence yields the following segmented regression model: 



where two sets {di, . . . , dpA and {0, Si, ... , Skq} are equal, and {ujg} are defines as follows: 
if there is a change point located in {n — (p„ — i + l)m + 1, . . . , n — (p„ — £)m — 1}, say a^^^. 




Pn 




I = 



l,...,n, (6) 
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then 



otherwise, 



0, elsewhere; 



u}(,{i) = 0, i = 1, . . . ,n. 



The model ([6]) can be written as 

Pn 

= XnOn + ^ U3£ + En, 



(7) 



where i/„ and £„, are defined in Section 1, On = (6'i, . . . , 6'g(p^+i))'^ = {f3^ , dj , . . . , dp^y 

dj. (c^rl; • • • ! dj-q) , T 1, . . . , 



/ ^(1) 0, 



x„ 



(1) "mxij "rnxg ■ ■ ■ 'Jmxq 



Omxa ^ 



(2) 



X(2) Omxg ■ ■ ■ O^xij 



(8) 



nx{pn+l)q 



with Xn^ — {Oqxm, • • • ; Ogxrrn -^("'J)? • • • ! X|^^^-^y , 



(i)' 



X 



(1) 



\ ^ ' / (n— p„m)xg 



/ a;„_(p„_j+2)m+l,l ■ ■ ■ Xn-{p„-j+2)m+l,q \ 



X 



(3) 



for J = 2, . . . + 1, 



mx g 



y 3;„_(p^_j^l)„^ 1 ■ ■ ■ Xn-(p„-j+l)m,q J 

X^^ = diag(a3f , . . . , x'^), and ujg = (ci;J(l), . . . , u}J{n))'^. It is easy to see that x^^ = X^^ Yl^li ^t- 
is an n dimensional vector and all its elements excluding at most Ko^m — 1) of them are zeros. 
It is noted that in Harchaoui and Levy-Leduc (2008), the mean-shift model is considered and 
the length of each of their segments is only 1. 

Consider a special case that each true change point is at an end of a segment. Then an end 
of a segment is a true change point if and only if the corresponding dr ^ 0. Thus to locate 
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all the true change points in ([T]) is equivalent to carry out variable selection. Since p.„ -> oo, 
we may take advantage of the recent advances in consistent variable selection methods for a 
linear regression model as ([7]) with a large number of regression coefficients, which include 
the SCAD (Fan and Li (2001)), the adaptive LASSO (Zhou (2006)), and the MCP (Zhang 
(2010)) among others. 

Let us examine the relationship between the models ([T]) and ([7]). It can be seen that under 
the null hypothesis Hq, (3 = jd^, and dj. = 0, r G {1, ■ ■ ■ ,Pn}- We now assume that Hi hold. 
Thus, there exist {r^, k = 1, - ■ ■ , Kq} such that ak^n & {n — Pnm + (r^ — l)m, . . . ,n — PnTn + 
r^m — 1}. Since Kq is finite with an upper bound Ku, in view of ([2]) and (jS]), it follows that 

/3 = (3q, dr^^i = 0, dr^ = Sk^O, and d^fc+i = (9) 

for large n. Thus in order to detect all the change points {ai^„, . . . , aKo,n}, we may estimate 
{di} in advance. 

The following assumptions are made for investigating the asymptotic properties of the 
estimates of {di}: 

Assumption CI. X]!=s ^i^I /(t — s)— >H^>Oast — s— t-oo. 

It is noted that Assumption CI is a common assumption made in change point analysis for a 
mean shift model. Under Assumption CI, it can be shown that X(j)X(i)/(n— p„m) W > 0, 
and X^)X(i)/m -> TV > for z G {2, . . . ,p„, + 1}. 

Remark 1. Assumption CI is similar to Condition (b) in Zhou (2006). If we only consider 
the consistency of change point estimators. Assumption CI can be relaxed to the following 
weaker one: For bi, 62 > 0, bilq ^ Yll=s ^i^I /{t — s) ^ &2-^g when t — s is large enough. 
Assumption C2. {si, i = 1, 2, . . .} is a sequence of independently and identically dis- 
tributed (i.i.d.) random variables with mean and variance a^. 

Remark 2. This assumption can be replaced by a weaker assumption of the strong mixing 
condition in (2.1) in Kuelbs and Philipp (1980), which adapts to the autoregressive models in 
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Davis, Huang and Yao (1995) and Wang, Li and Tsai (2007). Let {si, i = 1, 2, . . .} be a weak 
sense stationary sequence of random variables with mean and (2+5)th moments for < 5 ^ 1 
that are uniformly bounded by some positive constant. Suppose that {e^, z = 1, 2, . . .} satisfies 
the strong mixing condition \P{AB) — P{A)P{B)\ ^ p{n) J, for all ra, s ^ 1, all A G Ml 
and B G A^^„, where A^^ is the a-field generated by the random vectors ea,£a+i, ■ ■ ■ ,£b, 
and p{n) « 7i~(i+*)(^+2/'5) for some t > 0. Then Theorem 4 and Lemma 3.4 in Kuelbs and 
Philipp (1980) warrant the same results as given in Theorems 1-3 below. 

For simple presentation below, we assume that each of is of full rank in this paper. 

If a is not of full rank, Moore-Penrose matrix inverse can be used instead of the matrix 
inverse. 

2.1. Estimate {di} by least squares 

By least squares method, we estimate d^, r = 1, . . . ,pn, as follows: 

dr = +1)^(^+1) - (XJ)X(,))-'XJ)2/W, r = 1, . . . (10) 

where y(^) = (yi, . . . , yn-p„mV, and y^''^ = {yn-{p„-r+2)m+i, • • • , 2/n-(p„-r+i)m)'^, r = 2, . . ., 
Pn + I- It is easy to see that 

It is obvious that under Hq, for any £ G {1, . . . ,Pn} and any i E {n — + 1, . . . , n}, 

u^eii) = and di = 0. 

We have the following theorem. 

Theorem 1. Assume that m — )■ cxd as n — >■ oo. If iJo holds, under the assumptions C1-C2, 
it follows that 

V^d^^dN {Q,2a^W~^) , i = l,...,pn. 
8 



We now assume that Hi holds. In view of ([9]), it follows that dr^. + dr^+i = ^k- By the 
definition of {ujiii)}^ we have 

Pn 

u!i{i)I{n — {pn — ^ + i)fn < i ^ n — (p„ — £)m) 

-dk, if 3 Tfc such that n - {pn - rk + l)m < afc_„ < n - (p„ - rfc)m, 

(11) 

0, otherwise. 
It can also be verified that 

Pn 



d£/(n — (pn — ^ + 1)"^ < z ^ ra) 

di, if n - (p„ - rfc + 2)m < ? ^ n - (p„ - + l)m, 



£=1 



(12) 



d£, if n - {pn - rk)m < i ^ n - {p^ - - l)m. 

Thus, we have the following theorem: 

Theorem 2. If Assumptions C1-C2 hold, then under Hi, 

(dr, + dr,+i - 5k) N (0, 2a^W^^) , A; = 1, . . . , i^o- 

The proofs of Theorems 1-2 follow from the least squares theory. The details are omitted. 

2.2. Estimate [di] by recent advances in consistent variable selection methods 

2.2.1. Estimate {di} by the adaptive LASSO 

The adaptive LASSO, extending the LASSO in Tibshirani (1996), was proposed in Zhou 
(2006) and possesses oracle properties for fixed number of regression coefficients. 

In light of Zhou (2006), the adaptive LASSO type estimator of 0„ for the model ([7j) is 
defined by 



On = &Tgmm{\\y - XndnW + -^n / , TTTT I'^'-l r ' (13) 

On 



where z/ > 0, A„ is a thresholding parameter and dr {r = 1, ■ ■ ■ ,pn} are initial estimators 
satisfying certain conditions. 

Remark 3. The adaptive LASSO estimate of On may also be defined by 

Pn q ^ 1 

On = argmin ||y - XnOn\f + -^n ^ ^ -p— M«| + 7n 

0„ -^^ \dri\ .^-(^ \poi\ 

where /i > 0, A.„ and 7„ are thresholding parameters satisfying certain conditions. The 
difference between f|T3|) and (|T^ is that the variable selection in addition to the multiple 
change point detection is also considered in f|T^ . Due to the similarity in the techniques for 
finding the asymptotic behavior of both On and On, we only consider On in this paper for 
simple presentation. 

Since the dimension of 0„ increases with in ([7]), the asymptotic results in Zhou (2006) 
are not applicable here. In the following we will investigate the limiting behavior of those 
diS associated with change points under the condition that Kq ^ 1, i.e., there exists at least 
one change point in the model ([1]). As stated before, the subscript n may be suppressed for 
convenience if there is no confusion. 

Before we proceed, we define some notations as follows: Let B = {ki, k,2i ■ ■ ■ , i^t} C 
{2,...,p„ + l}suchthat/€i <...<K,. Denote 6»B = ,dlf,Xs = ixt'\ ■ ■ ■ , xt'^), 

where {xi'^} are given m (jejj. 

Recall that for each Sk in ([1]), there exists such that d^j. = dk, or equivalently there 
exists a change point within {n — {pn — + l)m, . . . ,n — {pn — rk)m — 1} for /c = 1, ... , Kq. 
Define 

Ac = {i ■■ di^i = 0, di ^ 0, di+i = 0}, Ai = {i : ^ 0, di = 0, d^+i = 0}, 

A2 = {i: dj-i = 0, di = 0, di+i ^ 0}, A^ = {i : di-i = 0, di = 0, d^+i = 0}. 

It is easy to see that for large n, = U ^2 U ^3. 
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In view of Zhou (2006) and Huang, Ma and Zhang (2008), we need to make some assump- 
tion on the initial estimators {di} used in ( TT3l) for investigating the asymptotic properties of 
On- By the remark 1 of Zhou (2006), one might assume that for any i, there is a sequence of 
{a„} such that a„ — j- oo and a„(<ij — d^) = Op{l). But p„ is fixed in Zhou (2006). Huang, 
Ma and Zhang (2008) allows p„ — oo as n — )■ oo. Thus a stronger assumption like that 
r„maxj \di — dj| = Op(l) as r„ — )■ oo (see (A2) of Huang, Ma and Zhang 2008) might be 
made. However such assumptions may not be enough for the multiple change point detection 
problem. A careful study shows that we need put some lower bound on \di\ for i & Ac such 
that they are not close to 0. Hence we make the following assumption on {dr}: 
Assumption C3. There exists a constant a > such that for large n, 




^ a > 0, for i G Ac, 

= Op{l/y/m) , for Ac. 



To obtain {d^} in practice, we can estimate the set Ac first, which, for example, may 
be estimated by the lease squares based multiple change point detection algorithm given in 
Subsection 3.1. After we obtain the estimate Ac of Ac, we can set di = c for i G Ac, and 
Iq/ ^Jm otherwise. 

To study the asymptotic behavior of Q, the following three Lemmas are necessary. 

Lemma 1. Under Assumption CI, there exists positive definite matrix (defined in 
(IA.4P in the appendix) such that X^^X_4^/n — )• W^^. 

Remark 4- One can not replace X^^X^^ by XjX„ above since the minimum eigenvalue 
may converge to in consideration of the fact that p„ — oo (see Condition (b) in Zou (2006) 
and (2.13) in Zhang and Huang (2008)). Thus if they allow p„ — oo, their conditions no 
longer hold and may be strengthened as Assumption CI. 

Lemma 2. Under Assumption CI, for large n elements of Xja3^/m are uniformly bounded. 

Lemma 3. Under Assumptions C1-C2, for large n elements of Xj£„/-y/n is uniformly 
bounded in probability. 
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If there exists at least one change point, i.e., Kq ^ 1, the hmiting behavior of the adaptive 
LASSO estimator On is given in the following theorem. 

Theorem 3. Assume that Xn/^/n 0, m/y/n — )■ and Xn^n/pnY^'^/y/n — oo for > 
as n — )■ oo. If Assumptions C1-C3 hold, then 

Remark 5. If we replace the weight by exp(— l/|x|) in flT5]) . the condition 

Xn{n/pnY^'^ / \/n — 7- oo can be relaxed to the weaker condition: A^exp {^\/n/pn j /^A^ oo. 
Although it may result in an absorbing state in x = (see Fan and Lv (2008)), it has not 
occurred in simulations. 

Remark 6. By ( IT3|) . ^ is a unique solution of a convex optimization problem and hence 
the Karush-Kunh- Tucker condition holds. For any vector h = [bi, . . . ,bp)'^, denote its sign 
vector by sgn(6) = (sgn(6i), . . . , sgn(6p))"^, with the convention sgn(O) = 0. As in Zhao and Yu 
(2006), we say that 0„ =s if and only if sgn(^„) = sgn(0). If the condition p^/n'"/'^^"'"'^) = o(l) 
is further assumed to hold, by Lemma 1-3 and Theorem 3, it can be shown that 

-P(^n =s ^ I, as — !■ oo. 

The proof is similar to the proof of Theorem 1 in Huang, Ma and Zhang (2008) and hence 
omitted. 

2.2.2. Estimate {d,} by the SCAD or MCP 

SCAD (Fan and Li (2001)) and MCP (Zhang (2010)) are two popular recent consistent variable 
selection methods. They can also be employed to solve the multiple change point detection 
problem. 

Consider the following estimator of 6^. 

O'' = argmin <\\y - XnO\f + n^px,^{\dr\) > , 

^ I r=l ) 
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where pa,7 is the penalty function with tuning parameters A > and 7 > 0. If 



Xx, if X ^ A, 

7Ax - 0.5(x2 + X^) 

' ifA<x^7A, (15) 

A'(7 + l) 

— ^^7; -, if X > 7A, 



the SCAD penalty function proposed by Fan and Li (2001), is the SCAD type estimator 



of On- Denote it by ^ . Instead, let 

/ \ f Ax— f^, if X ^ 7A, 
27A^ if x > 7A, 

the MCP penalty function proposed by Zhang (2010), O'^ becomes the MCP type estimator 
of On- Denote it by 

^scad ^mcp 

Under certain conditions, the asymptotic properties of both and are similar to 
the asymptotic properties of 0. Since the emphasis of this paper is on the algorithms for 
detecting multiple change points, their asymptotic properties will not be discussed here. 

3. Multiple change points detection algorithms 

For a given p„ or m, we divide the data sequence into p„ + 1 segments such that the first 
segment has the length between m and com with cq ^ 1 and the rest pn segments are all of 
length m, and we have the model ([6]). Define 

n—pnm 



A-2 



iyi-xj^)^/{n-pnm-q) (17) 
1=1 

with ^ = (Xfi)X(i))-iXj;)i/(i). Given a significance level a, five multiple change point detec- 
tion algorithms are proposed in this section. 

3.1 Least squares based multiple change point detection algorithm 

In light of Theorems 1-2, the least squares based multiple change point detection algorithm is 
given as follows: 
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Least squares based multiple change points detection algorithm (LSMCPDA): 
Step 1. Set i = 1, j = 1 and K = 0. 

Step 2. \i i ^ Pn — 3, go to Step 3. Otherwise, we test the hypothesis Hq i : dj = by 
checking if 

dfXj+i)X(,+i)d,/(2g<T^) ^ 

where di is given in f fTOl) . If the test is significant, set i = i + 1 and repeat Step 2, otherwise 
we test the hypothesis iJo,(j+i,i+2) : cii+i + = by checking if 

If the test is not significant, set z = z + 1 and repeat Step 2, otherwise, a change point estimate 
is n — p„m + im. Set fj = n — Pn^T^ + j=j + '^ii = i + '^i and ii' = + 1. Then repeat 
Step 2. 

Step 3. If = 0, then go to the next step. Otherwise, we use the CUSUM to improve the 
accuracy of the multiple change point detection as follows: We search for the change points 
within the K sets: {^{n — PnTH + {vj — l)m, . . . ,n — Pnm + {r,j + l)m}, j = 1, . . . , by the 
CUSUM. An estimate of the change point within the jth set is given by 

CLj^n = argmax 

Step 4- If K = 0, there is no change points. Otherwise, there are K change points and they 
are Oi^n? • • • ? o^^^. 

If in the algorithm above, the chi-square tests in Step 2 are replaced by the CUSUM tests 
(see Appendix A.l) and Step 3 is replaced by Steps 3-5 of the SMCPDA with {f|="^}, {dt^'"^}, 
j^scad i^scadj replaced by {fj}, {dj}, K, and {aj,n} respectively, the new algorithm is 
named as CLSMCPDA, where "C" is the first letter of "CUSUM". 

3.2 Adaptive LASSO based multiple change poins detection algorithm 
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I n-pnTn+{fj + l)m 

min ^ {yj - xj(3f + niin ^ {yj - xJ/B)^ 

j=n—p„m+{fj — l)m j=£+l 



In light of Theorems 3, the adaptive LASSO based multiple change point detection algorithm 
is given as follows: 

74daptvie Lasso based multiple change joints detection algorithm (ALMCPDA): 
Step 1. Set i = 1, j = 1 and ^ = 0. Execute the algorithm LSMCPDA and obtain K. If 
> 0, we also obtain ai^n, ■ ■ ■ ,o,k n- 



Step 2. If = 0, set di = ■ ■ ■ = dp^ = Ig/ y/m, otherwise, set 
dp = 



clg, i e {rfc, rfcm < ak,n -n + pntn ^ (r^ + l)m}, 

Iq/ y/m, elsewhere; 



where is an integer such that r^m < „ — n + PnTn ^ {r^ + V)m and c is a prechosen 
constant. Select A > and v > Find the adaptive LASSO estimate of 6 via 



6 = argmin 




and we obtain di for 1 ^ £ ^ Pn- 



{|I„_x„.||Va|^K|} 



Step 3. We compute = ||d^||oo for 1 ^ £ ^ p„. If zi = Z2 = ■ ■ ■ = Zp^ =0, go to 
Step 5. Otherwise, we treat {zi} as random variables from the model z = ^ + e with 
= (/ii, . . . , /ip„)"^ and e ~ A^(0, /p„). Use LASSO, SCAD or MCP among other recent 
advances in variable selection to perform variable selection based on {zp}. We obtain the 
estimates {fii}. If /i^, 1 ^ £ ^ Pn, are all zeros, set = and go to Step 6. Otherwise, let 
X be the subset of {1, . . . ,p„} such that £ G X if and only if /i£ 7^ 0. Write X = {si, . . . , s\x\} 
such that si < . . . < s\x\- 

Step 4- If i > |X|, go to Step 5. Otherwise, we test the hypothesis Hq s^ : dg. = by checking 
if 

{pn - Si)dlXj,^^^)X(^s,+i)dsJ{qal) ^ xl,q, 
where is given in (fT7|) . If the test is not significant, set i = i + 1 and repeat Step 4. 
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Otherwise, a change point estimate is n — p„m + {si — l)m. Set rj = n — pnm + (sj — l)m, 
j=j + l, i = i + 2, and K = K + 1. Then repeat Step 4. 

Step 5. \i K = 0, then go to the next step. Otherwise, we use the CUSUM to improve the 
accuracy of the multiple change point detection as follows: We search for the change points 
within the K sets: {{n — p„m + (r^ — l)m, . . . ,n — Pn'^ + {rj + j = 1, . . . , by the 

CUSUM. An estimate of the change point for the jth set is given by 



Qj^n = argmax 



mm 



n—pnm+{fj + l)m 

T ro\2 



3 3 

" j=n—pn''n+{fj — l)m j=£+l 



Step 6. If ^ = 0, there is no change points. Otherwise, there are K change points and they 
are ai^ni ■ ■ ■ ■, ^fc^n- 

If the algorithm above, the chi-square test is replaced by the CUSUM test in Step 4, the new 
algorithm is named as CALMCPDA, where "C" is also the first letter of "CUSUM". Denote 
all the estimates based on CALMCPDA by adding a superscript "C" to the corresponding 
estimates based on ALMCPDA. For example, the estimate of Kq based on CALMCPDA is 
denoted by . 

3.3 SCAD based multiple change points detection algorithm 

Similar to the ALMCPDA, the SCAD based multiple change point detection algorithm is 
given as follows: 

S'CAD based multiple change j)oints detection algorithm (SMCPDA): 
Step 1. Set i = l, j = 1 and K"""^ = 0. 



SC,Q,(^ I / ^ SCQ,d\ / ^ SCQj(i\ 

Step 2. Select A > and 7 > 0. Find the SCAD estimate 6 = ( (/3 J , f di J 



, . . ., 



" scad\ ^\ 

I 1 of via 



- scad I ^" 

e^"" = argnnn <^\\y - XnO\f + n ^pA,7(|«ir|) 



r=l 
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where pa,7 is given in ([To]) and we obtain for 1 ^ i ^ Pn- 



" scad 



for 

oo 



Step 3. It is same as Step 3 of ALMCPDA with = ||d^||oo is replaced by zi 
I ^ i ^ Pn and .R' = is replaced by K^^"-'^. 

Step 4- li i > go to Step 5. Otherwise, we test the hypothesis Hq^s. : d^. = by CUSUM. 
If the test is not significant, set i = i + 1 and repeat Step 4. Otherwise, a change point 
estimate is n — p„m + (sj — l)m. Set f|™"' = n — p„m + (sj — l)m, j=j + l,i = i + 2, and 
^scad ^ ^scad ^ ^_ rpj^g^ repeat Step 4. 

^'tep 5. If i^**™'^ = 0, then go to the next step. Otherwise, we use the CUSUM to improve the 
accuracy of the multiple change point detection as follows: We search for the change points 
within the i^'^™"' sets: p.„m + (f|™'^ — l)m, . . . , n— p„m + (fj^"'^+ l)m}, j = 1, . . . ^i^^^adj 

by the CUSUM. An estimate of the change point for the jth set is given by 



af i"^ = arg max 



n—pn'm+{f'j'^'"^+l)m 



mm 



3 3 



Step 6. If = 0, there is no change points. Otherwise, there are change points and 

they are ar^*^, . . . , a!>'^'^ . . 

1,71 " " J^scad ^ 

3.4 MCP based multiple change points detection algorithm 

The differences between the SMCPDA and the MCP based multiple change point detection 
algorithm (MMCPDA) are as follows: 

1. The superscript "scarf" in the SMCPDA is replaced by the superscript "mcp" in the 
MMCPDA. 

2. The step 2 in the SMCPDA is modified to the following step 2 in the MMCPDA: 
Step 2. Select A > and 7 > 0. Find the MCP estimate CT^^ = (^(^^""^^Y , (d!^^^^ ^ 
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e"""^ = argimn - + n^pA,7(|cir|)| , 

where pa,7 is given in f[TB|) . 

Remark 7. The use of CUSUM in these algorithms is for improving the change point 
estimation accuracy. The amounts of computing time required by these algorithms are all 
0{n) + 0(m), where 0{m) corresponds to the time required for using CUSUM method. If a 
segmentation satisfies that m = o(n), 0{n) + 0{m) = 0{n), which is computationally more 
efficient than the existing multiple change point detection methods in literature. 

4. Simulation study 

In this section, we present simulation studies of multiple change point analysis. Since the time 
for finding the multiple change points in a large sample by the algorithms proposed in Section 
3 is significantly reduced compared to the existing multiple change point detection methods 
in the literature, such comparison studies are omitted in this section. We will only compare 
the number of times of selecting the true number of change points and the accuracy of change 
point estimation by the algorithms proposed in Section 3 based on 1000 simulation. A Dell 
server (two E5520 Xeon Processors, two 2.26GHz 8M Caches, 16GB Memory) is used in the 
simulation. 

It is noted that the LARS algorithm (Efron, Hastie, Johnstone, and Tibshirani 2004) is 
used to compute On defined in ( |T3i) with v = 1 and an optimal selected by the BIC. For 
applying LARS, the added penalty on (3 is set as l/|lq|, which will not affect the multiple 
change-point detection results as /3 7^ 0. The PLUS algorithm (Zhang, 2010) with the added 

scad ^ Tficp 

penalty np\^^{\P\) on /3 is used to compute 0^ defined in (fT5l) or 6^ defined in ( fT6l) . which 
also do not affect the multiple change point detection results as /3 7^ 0. Let be given in 
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(fT7|) . We use A = (j„ ^^2 logp„/n in the PLUS algorithm as suggested in Zhang (2010). In 
all of our numerical examples, we set 7 = 3.7 for SCAD by following the recommendation of 
Fan and Li (2001), but set 7 = 2.4 for MCP based on some preliminary simulation studies. 
It is noted that in the step 3 of the algorithms ALMCPDA, CALMCPDA, SMCPDA, and 
MMCPDA, we use SCAD to perform variable selection for model z = fi + e hj applying 
the PLUS algorithm with A = 0.02. To use such small A is for avoiding the possibility of 
overestimation of the number of multiple change points. 
Throughout this section, a = 0.05. 

4.1. The case that there is no change point in the data sequence of size 5000 

In this subsection, we consider the case that there is no change point in the data sequence. 
We will examine the performance of the proposed algorithms to see if they do claim that there 
is no change point. 

Consider the following linear model 

yi = xJf3Q + Ei, z = l,...,n, 

where /3q is a g x 1 parameter vector. Set n = 5000, g = 3, /3q = (1, 1.4, 0.7)"^, and = 1 
for i = 1, . . . , 5000. Generate Si, i = 1, . . . , 5000, such that they are i.i.d. A^(0, 1) distributed, 
and generate two sequences Xi^2,n, I,. ■ ■ ,n, and Xi^s^n, 1, . . . ,5000, such that they are i.i.d. 
A^(l,2) distributed. For demonstration, a sample scatter plot of simulated data is given in 
Figure 1. 

We compare the following five algorithms: LSMCPDA, both ALMCPDA and CALM- 
CPDA with c = 1, SMCPDA and MMCPDA. Recall that all the tests used in the algorithms 
CALMCPDA, SMCPDA, and MMCPDA are based on CUSUM. The number of correct de- 
tection and average computation time in second based on 1000 simulations are given Table 
1. 
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Figure 1: There is no change point in the data sequence. 

From Table 1, it can be seen that all algorithms perform very well. The average detection time 
required by CALMCPDA for a sample of size 5000 is more than other proposed algorithms 
but only 6.78 seconds. 

4.2. The case that there are nine change points in the data sequence of size 5000 

In this subsection, we consider a case that there are nine change points in the data sequence 
of size 5000. We will examine the performance of the proposed algorithms via the rate for 
correctly estimating the number of change points and the accuracy of change point estimation. 
The average computation time for multiple change point detection is also given for each 
algorithm. 
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Table 1: The entries are the numbers of correct change point detection by the five algorithms 
LSMCPDA, ALMCPDA, CALMCPDA, SMCPDA and MMCPDA and the corresponding 
average computation time based on 1000 simulations. 





LSMCPDA 


ALMCPDA 


CALMCPDA 


SMCPDA 


MMCPDA 


No. of Correct Detection 


999 


996 


1000 


1000 


1000 


Average Computation Time 


1.42 


3.84 


6.78 


1.93 


1.98 



Consider the model ([T]), i.e., 

Q Ko q 



yi,n = ^ Xi,j,nl3j,o + X] X] ^«-i."4o^("U <i^n)+ei^ 
j=l i=l j=l 



Si i 1, . . . , 77.. 



As in Subsection 4.1, set n = 5000, q = 3, fB^ = (1,1.4,0.7)"^, choose Pn = [n/50\ and 
m = \n/{pn + 1)J, and generate {xij,n} and {si} in the same way as in Subsection 4.1. Set 
Kq = 9, di = S3 = 65 = Si = Sq = (0.5, —0.7, 0.4)-^, and S2 = S4 = Sq = Ss = —Si. Consider 
the following two change point location settings: 

CPLl. ai = 500 X 7, for 7 = 1,..., 9; 

CPL2. ai = 503, as = 923, 03 = 1471, 04 = 2077, as = 2334, = 2890, aj = 3410, 
as = 3909, and ag = 4546. 

For demonstration, two scatter plots of simulated data for the settings CPLl and CPL2 are 
given respectively in Figures 2-3. One can hardly find any change points from these two 
figures. 

We compare the following five algorithms: LSMCPDA, ALMCPDA, CALMCPDA, SM- 
CPDA and MMCPDA. Let hi stand for en, di, df, df"^ or a"'^ for z = 1, . . . , 9. We check 
the accuracy of multiple change point estimation based on each algorithm by examining the 
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Figure 2: The scatter plot of simulated data for Setting CPLl. 

distance between di and for i = 1, . . . , 9. We only consider such distance to be equal to 
or less than or equal to 5 or 10. The simulation results for the two change point location 
settings CPLl and CPL2 are presented in Tables 2-3. 

From both tables, it can be seen that all algorithms perform well in terms of accuracy of 
multiple change point estimation and the rate for correctly estimating the number of change 
points. The ALMCPDA and CALMCPDA are compatible and in generally outperform others. 
The average detection time required by CALMCPDA for a sample of size 5000 is more than 
all other algorithms, which is 8.20 seconds for CPLl and 8.65 seconds for CPL2. In contrast, 
the average detection time required by ALMCPDA is only 5.61 seconds for CPLl and 5.97 
seconds for CPL2. 
4.3. Practical recommendation of Pn 

It is clear that the choice of Pn will affect the performance of the proposed algorithms. Too 
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Figure 3: The scatter plot of simulated data for Setting CPL2. 

large pn may tend to underestimate the true number of multiple change points and increase 
biases in change point estimation while may cut down the computation time. Hence a care 
must be taken in choosing a proper p„, and we propose the following algorithm: 
Step 1. We choose an initial set B containing probable values of p„. 

Step 2. For each pn in the set i3, we obtain an estimate of On in f lT^ by using an algorithm, 
say ALMCPDA. We can then calculate the residual sum of squares, denoted by RSS{pn)- 
Step 3. The optimal Pn is chosen as argmiup^gg RSS(j)n). 

5. Empirical applications 

In this section, we consider empirical applications of the multiple change point detection 
methods proposed in this paper by analyzing the U.S. Ex-Post Real Interest Rate (Garcia 
and Perron, 1996) and Gross domestic product in U.S. A (Maddala, 1977). 
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5.1. The U.S. Ex-Post Real Interest Rate 

Garcia and Perron (1996) considered the time series behavior of the U.S. Ex- Post real interest 
rate (constructed from the three-month treasury bill rate deflated by the CPI inflation rate 
taken from the Citibase data base). The data are quarterly series from January, 1961 to March, 
1986, which is plotted in Figure 4. We are interested in finding out if there are change points 
in the mean of the series. Thus we apply the proposed algorithms to the mean shift model. It 
is noted that by Remark 2, the algorithms are applicable even if there exists potential serial 
correlation. 



"3 



c<3 

I 

X 



I 




lOO 



Figure 4: U.S. Ex-Post Real Interest Rate, the first quarter of 1961 - the third quarter of 
1986 



First, we need to select a Pn- Following the recommendations in Subsection 4.3, we will 
choose an optimal p,„ from the range 3 to 13. For each Pn G {3,4, . . . , 13}, we obtain 6^ by 
the ALMCPDA, and calculate the corresponding RSS{pn)- Choose aig min3<^p^^<Qi3 RSS{pn) 
as the optimal Pn, which is 5. See Figure 5. 

Based on the first step, we set Pn = 5 and apply the five algorithms given in Section 3 to 
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the data. Two change points are found based on the ALMCPDA and the CALMCPDA, which 
are located at 47 and 79 (see Figure 4) with RSS=455.95 corresponding to the third quarter 
of 1972 and the third quarter of 1980. These results are consistent with those of Garcia and 
Perron (1996). However the other three algorithms LSMCPDA, SMCPDA and MMCPDA 
only detect one change point located at 47 with RSS=1214.89. By comparing their RSSs, it is 
clear that both ALMCPDA and CALMCPDA have better performance than the other three 
algorithms. 




Figure 5: RSS{pn) against Pn for the U.S. ex-post real interest rate data 



5.2. Gross domestic product in U.S. A 

The data presented in Maddala (1977, Table 10.3) gives the gross domestic product (G), the 
labor input index (L) and the capital input index (C) in the United States for the years 1929- 
1967. logG is modeled as a linear function of logL and logC. The logG, logL and logC are 
plotted over time given in Figure 6. Worsley (1983) used the likelihood ratio method to search 
for change points in this data set and pointed out that the data contained two change points 
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located at 1942 and 1946 (RSS= 0.011). Caussinus and Lyazrhi (1997) used Bayes invariant 
optimal multi-decision procedure to detect change points in the data series and claimed three 
change points located at 1938, 1944 and 1948 (RSS= 0.01). 




I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1929 1936 1943 1951 1959 1967 

Figure 6: Logrithms of Gross domestic product (logG), labor-input index (logL) and capital- 
input index (logC) in U.S.A. for the years 1929-1967. 

Since the sample size is only 39, the proposed algorithms employing least squares or the 
CUSUM test may not work. Thus we only apply the first two steps of the SMCPDA or the 
MMCPDA to carry out multiple change point analysis. As in the previous example, we need 
to select a Pn- Following the recommendations in Subsection 4.3, we will choose an optimal 

^ scad 

Pn from 13 to 17. For each pn G {13, . . . , 17}, we obtain 0^ by the SMCPDA, and calculate 
the corresponding RSS{pn)- Choose arg mini3<;p^<;i7 i?S'S'(p„) as the optimal p„,, which is 17. 
With Pn = 17, four change points detected by applying the SMCPDA are located at 1936, 
1942, 1946 and 1950 with RSS=0.0054. With the same p„, two change points detected by 
applying the MMCPDA are located at 1942 and 1958 with RSS=0.015. Thus, in terms of the 
RSSs, the SMCPDA has a better performance. 
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6. Conclusion 

By properly segmenting the data sequence, we proposed five multiple change point detection 
algorithms. The proposed approach is based on the following reasons. On the one hand, 
a proper segmentation can isolate the finite change points such that each change point is 
only located in one segment, and a connection between multiple change point detection and 
variable selection can be established. Thus the recent advances in consistent variable selection 
methods such as SCAD, adaptive LASSO and MCP can be used to detect these change points 
simultaneously. On the other hand, a refining procedure using a method such as CUSUM 
can improve the accuracy of change point estimates. Compared with other change point 
detection methods, which is very time consuming, the newly proposed algorithms are much 
faster, more effective, and have strong theoretical backup. The proposed approach can be 
extended to detect multiple change points in other models such as generalized linear models 
and nonparametric models without any extra difficulties. 
Appendix 

A.l. CUSUM test for a single change point 

Consider the following model 



where = (?/„^, . . . , Vne+^V , x^,, x^.+i, ■ ■ ■ , x^,^^ are g-dimensional predictors, fB^ and 
are unknown g-dimensional vectors of regression coefficients, and £„ = {6^, ■ ■ ■ ,£ni+i)'^ ■ If 
Hi ^ k < ne+i and f3i ^ ^2-. there is a change point at k. 
Let Ni = n^+i — + 1. Define 






^Ik = TT min ^(?/, - xjf3f + min ^ {y, - xjf3f , 




and a] 



min^Er=ni(z/i - xfl3f/Ne. By Theorem 3.L1 of Csorgo and Horvath (1997)), it 
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follows that 



lim P 



aiA^^ ^ x/2 + bi^q = exp ( -26-^^/2) 



(A.2) 



for all X, where = (2 log log iV^)^/^ b^^g = 21oglogiV£ + g(logloglogiV^)/2 -logr(g/2), r(a;) 
is the Gamma function, Ai = maxn^+g^kiine+i-q ~21og (o"|^/(3"|)^^''^ . 

In light of the proof of Corollary 2.1 of Huskova, Praskova and Steinebach (2007), it can 
be shown that 



lim P 



A^^ ^ x/2 + be^q 



lim P 



{Ai - be,q)/ai^g ^ x 



where bi^g = {pi q/aiY and = bi q/aj, which jointly with (]A.2|) implies that 



lim P 



(A^ - be^q)/ai^q ^x = exp (-2e ^^Z^) 



By Lemma 3.1.9 of Csorgo and Horvath (1997), it can be shown that 



lim P 



max 



Nei^e -o"f,fc) -be,<i ^ ^ 



exp (-2e-^/2^ . 



Let Tg^k = Ng{aj — (j|^) and Tg = max„^+g<gfc«;n£+i-g Given a significant level a, the 
CUSUM test for testing if there is a change point in the model OA. II) is given in the following: 
If 



6£,g + 2af,,log(-2/log(l - a)) 



there exists a G {ne + q, . . . , n^+i — q} such that f3i ^ ^2 ^^e model (lA.ip . 

Denote Q = Y^^^^ XixJ, = C^^ Y!l=n, ^iVh Ci,k = Etn, ^i^L = Ci ~ d^k, 
Si,k = Yli=n ^iiVi ~ ^I^e) ior k = ni + q, . . . , ra^+i — q. By Huskova, Praskova and Steinebach 



(2007), 



(A.3) 
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Since Si^k and ^ can be computed recursively, the computing time of is reduced to 
0(n£+i — ng) from 0((n£+i — n^Y) by using ( lA.Sp . 



A. 2. Proof of Lemma 1 

Denote the elements of Achy Ac = {ri,r2., . . . , tkq}- In view of | 

0'k,n/n Ti, for k = 1, . . . , Kq and m = o{n), by Assumption CI, it follows that 



n 



Hence, 



n 



U 



T 







v 







f (t-2 - ri)iy 

{rs-r2)W 











where 



U 



1 "v-T 





\ 








1 + 1 T^T V / 



U = Wa^>0 



(k 







(A.4) 



A. 3. Proof of Lemma 2 

As in the proof of Lemma 1, denote Ac = {ri, ■ ■ ■ , rxo}- K is easy to see that 



1 ^" 

X^x^/m = — V 
m ^ — ^ 



^i=n—(p„ — 1)771+1 
\ V 

\ L^i=n—m+l 
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XixJijJi{i) ) 



Consider the first row of X^j^x^^jm. By Assumption CI, Z]r=i-'(pJ~'-T+i)™+ 



^ XixJ /m — >■ W. 



Hence For large n, 



Pn n 



1 

EE 

i=l i=l 
T Ko 



Xixjujfii) 



m 



Ko 

E 



i=n-{pn—rj+l)m+l 



n-{p,i-rj)m 

E 

i=n-(p„-rj + l)m+l 



XiXj^ 



^ 2Ko\\W\\ max 



Similarly, it can be shown that for large n and 1 ^ s ^ n. 



EJi ll^ill ' 1 ^ s ^ «l,n, 



XixJiVi^i] 



-1 i=s 



Ylf=2 ll^ill ' ^hn < S ^ a2,r. 



^ 2\\W\\ < 



II^Koll ) (',Ko-l,n < S ^ (''Ko,n, 

0, elsewhere; 



(A.5) 



^ 2K, 



max ||<5,-|| . 



(A.6) 



In view of flA.5P - flA.6p . each element of X^x^/m is bounded by 2_fro||W^|| ^^'^is^ikKq W^jW- The 
proof is complete. 



A. 4. Proof of Lemma 3 

By the definition of Xn, it follows that 



X„ en/Vn= ^ 
In 



I Er=i ^i^i \ 

En X £ 

i=n—pn'm+l * * 

En 
i=n-{p„ 



(p„-l)m+l ^i^i 



■m+1 



Consider the first element of X^Sn/y/n- By Assumption CI, for j = 1, . . . , q, X]r=i ^ij/'^ ~^ 
Wjj. By applying Markov's inequality, we have E"=i a^^,j£^^/^A^ = Op{l). 

In the following, we show that for any e > 0, there exists an such that 



Pn i=P I — ;= max 



E 

i=n—{pn — k+l)m+l 



> MA < e. 
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Denote r]ij = 



n—{p„—£)m 



^ XijEi. Then we have 



Pn,j = P —j= max 



Note that for any v > u > 0, hy Assumption CI, we have 

Var ^y^^Vtj/Vrri^ ^ 2{v - u)Wjja'^ ^ 2{v - u)a'^ max Wjj, 
when n is large enough. By Lemma 2.1 of Lavielle (1999), it follows that 



P I max 



y^^p„-f+i,j/ 



CPn 



^ c/M^ < e. 



M^n/m 

which means that each element of vector X'^Sn/ \/n is bounded uniformly in probability. The 
proof of Lemma 3 is complete. 



A. 5. Proof of Theorem 3 



Let u = {uq, uJ, . . . , u^^y be bounded. Put = On + 



and 



y 



2 

(\ Pn / \ Pn -. 



Let Un = argmmipn^u) = argmin — ipn{0))- Thus = On + Un/ y/n, and we only 

need to investigate the limiting behavior of Un- Write ipn{u) — il)n{'^)=Vn{u) , which can be 
expressed as 



VJu) 



( u - 2u'^^ - 2u^^^ 

n I Jn Jn 



+ 



r=l 



\dJ 



Consider the following two cases: 



Case I. For any r ^ Ac, u^. = 0; 
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Case II: There are some r ^ Ac such that Ur ^ 0. Denote the number of such rs as Uc- 



We first consider the case I. By Lemmas 1-2 and the assumption that m/y/n — > 0, it can 
be shown that as n — oo, 

(Al) [iXlXn) u = u^^^ u^^ ^ u^^W^u^^; 

(A2) u^X^e/^ = u'^^iX^^^e)/^ -^^ ^^a^^A, where w_a^ = N{0, a^W^J; 

(A3) u^X^xJ^^O. 

Note that for any r ^ Ac, the second term of Vn{u) equals to 0. Let r E Ac- By Assumption 



C3, it follows that ^ c in probability. Since ^/n 

\Ac\ = Kq, by the assumption that Xnj-Jn 0, we have 



^ and 



\ ^ 1 

\ ^ J- 



n 









{ 













which, jointly with (Al)-(A3) above, implies that Vn{u) -^p u^W^^^^u^^ — 2u^^w^^, as 
n — )■ oo. 

We now consider the case II. By Lemmas 2-3 and the assumption that m/ ^/n — )■ 0, it can 
be shown that 

(Bl) (iXjX.) u^Q- 

(B2) u^Xlen/V^=Op{nc)] 

(B3) —u^Xlx^/^^^. 
ric 

As argued previously, it can also be shown that 



(B4) a%E....^^v/^( 
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Now let r ^ Ac- Since \{r, dj = 0, Ur 7^ 0}| = ric, by Assumption C3 and the assumption that 
^nin/priY^'^ / y/n — 7- 00, it follows that 



- E 



n 



\dr\ 




dr 



Pn 



ric J \fn \PnJ 

r<^Ac,dr=0,Ur^O 

which, jointly with (B1)-(B4), implies that Vn{u) — t-^ 00. 
So far we have showed that 

V„(u) V(u) = {"5 W-."-^ - 2<-A, Case I ^^^^ 
I oc, L^ase ii. 

It can be seen that is a convex function and has a unique minimum at ii such that u_^^ = 

and ■u_4^ = W^^Wy^^^. Since Ki(') is also a convex function and has a unique minimum denoted 

by Un, by (jAlD, 

tin = argmin\4(w) — T-p argmin = li, 

and hence, 

{Un)Ac -^P = ^^aI'^Ac and {Un)Ac = 0- 

In view of the fact that wa^ ~ A^(0, o"^VV^^), the proof is complete. 
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Table 2: The entries are the numbers of such that \ai — ai^n\ ^ 0, 5, 10 for z = 1, . . . , 9, the 
number of correctly estimating the number of change points and the corresponding average 
computation time by each of the five algorithms LSMCPDA, ALMCPDA, CALMCPDA, 
SMCPDA and MMCPDA based on 1000 simulations for the change point location setting 
CPLl. 
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Table 3: The entries are the numbers of such that \ai — ai^n\ ^ 0, 5, 10 for z = 1, . . . , 9, the 
number of correctly estimating the number of change points and the corresponding average 
computation time by each of the five algorithms LSMCPDA, ALMCPDA, CALMCPDA, 
SMCPDA and MMCPDA based on 1000 simulations for the change point location setting 
CPL2. 
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